Ongoing Developments in Automatically Adapting Lexical Resources to the Biomedical Domain

نویسندگان

  • Dominic Widdows
  • Adil Toumouh
  • Beate Dorow
  • Ahmed Lehireche
چکیده

This paper describes a range of experiments using empirical methods to adapt the WordNet noun ontology for specific use in the biomedical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns, that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Adapting an NLP Core Engine to the Biology Domain

Background: Rather than specifying rules, constraints and lexicons for NLP systems manually, we advocate a procedure for automatically acquiring linguistic knowledge using machine learning (ML) methods. In order to demonstrate how feasible this approach is, we automatically adapt OpenNLP, an open source ML-based NLP tool suite, to the sublanguage domain of biology. Results: In the first evaluat...

متن کامل

Bootstrapping a Verb Lexicon for Biomedical Information Extraction

The extraction of information from texts requires resources that contain both syntactic and semantic properties of lexical units. As the use of language in specialized domains, such as biology, can be very different to the general domain, there is a need for domain-specific resources to ensure that the information extracted is as accurate as possible. We are building a large-scale lexical resou...

متن کامل

Data Integration through data elements: Mapping data elements to terminological resources

Data integration is a crucial task in the biomedical domain. Data elements (DEs) play an important role in data integration and we propose to map DEs to terminological resources as an approach to data integration. We extracted DEs from eleven disparate biomedical sources. We compared these DEs to concepts and/or terms in biomedical controlled vocabularies and to reference DEs. We also exploited...

متن کامل

Onto.PT: recent developments of a large public domain Portuguese wordnet

This document describes the current state of Onto.PT, a new large wordnet for Portuguese, freely available, and created automatically after exploiting and integrating existing lexical resources in a wordnet structure. Besides an overview on Onto.PT, its creation and evaluation, we enumerate the developments of version 0.6. Moreover, we provide a quantitative view on this version, its comparison...

متن کامل

BioChain: Using Lexical Chaining Methods for Biomedical Text Summarization

1 ABSTRACT Lexical chaining is a technique for identifying semantically-related terms in a text. It is useful in document summarization in order to identify the top sentences most likely to contain the main ideas of a document or document set. These top sentences are then extracted and combined in order to produce a summary of the document(s). To date, summarization work using lexical chains ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006